The purpose of the report is to aggregate and examine selected techniques of imputation of missing data in the context of their impact on the prediction efficiency of classification algorithms. The following considerations include various imputation techniques, both basic (median / mode imputation) and more sophisticated (selected methods from the mice, VIM, missRanger and softImpute packages).
For testing purposes, as the classification algorithm, we used the ranger algorithm, which is a fast implementation of random forest, particularly suited for high dimensional data. The prediction effectiveness was assessed in relation to the AUC, balanced accuracy and Matthews correlation coefficient measures.

The report contains, all the results, grouped by both: package and dataset.

Basic (median/mode)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.14
## Test set imputation time:  0.218

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.781
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.004

Test set results

## Test set AUC:  0.951
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.005

Test set results

## Test set AUC:  0.575
## Test set BACC:  0.591
## Test set MCC:  0.215

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.006
## Test set imputation time:  0.005

Test set results

## Test set AUC:  0.93
## Test set BACC:  0.874
## Test set MCC:  0.752

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.035
## Test set imputation time:  0.015

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.089
## Test set imputation time:  0.044

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.01
## Test set imputation time:  0.008

Test set results

## Test set AUC:  0.903
## Test set BACC:  0.667
## Test set MCC:  0.459

Missings overview

Mice

adult

Crossvalidation results

Imputation times

## Train set imputation time:  5.385
## Test set imputation time:  0.72

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.776
## Test set MCC:  0.598

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.097
## Test set imputation time:  0.061

Test set results

## Test set AUC:  0.966
## Test set BACC:  0.903
## Test set MCC:  0.81

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.361
## Test set imputation time:  0.187

Test set results

## Test set AUC:  0.494
## Test set BACC:  0.492
## Test set MCC:  -0.017

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.077
## Test set imputation time:  0.073

Test set results

## Test set AUC:  0.905
## Test set BACC:  0.85
## Test set MCC:  0.697

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.257
## Test set imputation time:  0.094

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.907
## Test set MCC:  0.874

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  174.467
## Test set imputation time:  49.422

Test set results

## Test set AUC:  1
## Test set BACC:  0.969
## Test set MCC:  0.963

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  1556.293
## Test set imputation time:  103.295

Test set results

## Test set AUC:  0.931
## Test set BACC:  0.83
## Test set MCC:  0.69

Missings overview

VIM (K-Nearest Neighbors)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  114.478
## Test set imputation time:  7.295

Test set results

## Test set AUC:  0.914
## Test set BACC:  0.775
## Test set MCC:  0.595

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.304
## Test set imputation time:  0.118

Test set results

## Test set AUC:  0.966
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.526
## Test set imputation time:  0.123

Test set results

## Test set AUC:  0.589
## Test set BACC:  0.607
## Test set MCC:  0.234

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.113
## Test set imputation time:  0.06

Test set results

## Test set AUC:  0.923
## Test set BACC:  0.83
## Test set MCC:  0.667

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  6.307
## Test set imputation time:  0.558

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  275.061
## Test set imputation time:  20.435

Test set results

## Test set AUC:  1
## Test set BACC:  0.978
## Test set MCC:  0.974

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.966
## Test set imputation time:  0.4

Test set results

## Test set AUC:  0.926
## Test set BACC:  0.814
## Test set MCC:  0.648

Missings overview

VIM (Hot Deck)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.09
## Test set imputation time:  0.04

Test set results

## Test set AUC:  0.914
## Test set BACC:  0.779
## Test set MCC:  0.605

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.032
## Test set imputation time:  0.028

Test set results

## Test set AUC:  0.965
## Test set BACC:  0.901
## Test set MCC:  0.811

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.038
## Test set imputation time:  0.03

Test set results

## Test set AUC:  0.594
## Test set BACC:  0.594
## Test set MCC:  0.207

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.032
## Test set imputation time:  0.028

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.868
## Test set MCC:  0.733

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.052
## Test set imputation time:  0.045

Test set results

## Test set AUC:  0.995
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.572
## Test set imputation time:  0.315

Test set results

## Test set AUC:  1
## Test set BACC:  0.964
## Test set MCC:  0.956

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.068
## Test set imputation time:  0.064

Test set results

## Test set AUC:  0.922
## Test set BACC:  0.849
## Test set MCC:  0.708

Missings overview

missRanger

adult

Crossvalidation results

Imputation times

## Train set imputation time:  42.681
## Test set imputation time:  7.797

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.78
## Test set MCC:  0.603

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  1.859
## Test set imputation time:  0.643

Test set results

## Test set AUC:  0.957
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  1.728
## Test set imputation time:  0.364

Test set results

## Test set AUC:  0.58
## Test set BACC:  0.53
## Test set MCC:  0.065

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  1.313
## Test set imputation time:  0.482

Test set results

## Test set AUC:  0.962
## Test set BACC:  0.898
## Test set MCC:  0.797

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  11.328
## Test set imputation time:  1.766

Test set results

## Test set AUC:  0.997
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  564.31
## Test set imputation time:  90.004

Test set results

## Test set AUC:  1
## Test set BACC:  0.968
## Test set MCC:  0.961

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  4.922
## Test set imputation time:  1.889

Test set results

## Test set AUC:  0.934
## Test set BACC:  0.845
## Test set MCC:  0.708

Missings overview

softImpute

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.097
## Test set imputation time:  0.022

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.781
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.01
## Test set imputation time:  0.005

Test set results

## Test set AUC:  0.932
## Test set BACC:  0.839
## Test set MCC:  0.676

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.009
## Test set imputation time:  0.007

Test set results

## Test set AUC:  0.556
## Test set BACC:  0.542
## Test set MCC:  0.104

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.008
## Test set imputation time:  0.006

Test set results

## Test set AUC:  0.926
## Test set BACC:  0.874
## Test set MCC:  0.752

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.081
## Test set imputation time:  0.024

Test set results

## Test set AUC:  0.997
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.952
## Test set imputation time:  0.276

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.019
## Test set imputation time:  0.015

Test set results

## Test set AUC:  0.911
## Test set BACC:  0.748
## Test set MCC:  0.593

Missings overview